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Abstract 

The mutual information between a complex-valued channel input and its complex-valued output is decomposed 
into four parts based on polar coordinates: an amplitude term, a phase term, and two mixed terms. Numerical results 
for the additive white Gaussian noise (AWGN) channel with various inputs show that, at high signal-to-noise ratio 
(SNR), the amplitude and phase terms dominate the mixed terms. For the AWGN channel with a Gaussian input, 
analytical expressions are derived for high SNR. The decomposition method is applied to partially coherent channels 
and a property of such channels called "spectral loss" is developed. Spectral loss occurs in nonlinear fiber-optic 
chaimels and it may be one effect that needs to be taken into account to explain the behavior of the capacity of 
nonlinear fiber-optic channels presented in recent studies. 

Index Terms 

Mutual information, channel capacity, partially coherent channels, phase noise. 

I. Introduction 

The information encoded in complex-valued signals has two degrees of freedom which are commonly taken to 
be the signal's two quadratures - its real and imaginary parts. Alternatively, the signal can be decomposed into its 
polar coordinates - amplitude and phase. Historically, the first digital modulation constellations with two degrees 
of freedom were a combination of one-dimensional amplitude modulation (AM) and phase modulation (PM) [jj. 
Quadrature amplitude modulation (QAM), i. e., amplitude modulation of two orthogonal carriers, was not described 
until 1962, with the most significant progress in understanding made in the 1970s [2, Sec. 1.2]. 

The decomposition of complex-valued signals into their real and imaginary parts is the method of choice when the 
sub-channels transporting them have identical form and noise statistics. In particular, this is the case for the AWGN 
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channel, e. g., with circularly symmetric Gaussian or square QAM input. In contrast, the "old-fashioned" AM-PM 
view can be useful when physical effects act differently on the different sub-channels. Examples are systems that 
clip the amplitude (e.g., nonlinear amplifiers) or systems that introduce phase noise (e.g., phase-locked loops or 
certain nonlinear optical fiber effects). However, even for channels that introduce equal impairments to the signal's 
quadratures (such as the complex-valued AWGN channel), the AM-PM view may be preferable if this facilitates 
the input description, for instance for ASK-PSK modulation schemes. 

Decomposing signals using polar coordinates motivates decomposing the mutual information between the channel 
input and output using polar coordinates. We choose a decomposition that results in four terms: two partial channels, 
each with one degree of freedom (an amplitude and a phase channel), and two mixed terms that govern the 
exchange of mutual information across the sub-channels. We explain and discuss this method in Section We 
illustrate our results by applying the decomposition to the complex- valued AWGN channel. In Section Hill we derive 
analytical expressions for the AWGN channel with average power constraint (Gaussian input) and with constant 
power constraint (phase-modulated input). In addition, we present decomposition results for discrete ASK/PSK and 
QAM constellations. 

The second part of the paper deals with partially coherent channels, which are essentially AWGN channels with 
additional phase noise. Such channels motivate the development of the polar decomposition method described in 
this paper The earliest information-theoretic results on channels with reduced degrees of freedom, e. g., transmitters 
or receivers that are limited to amplitude modulation (AM) or phase modulation (PM), date back to 1953 IH. Some 
time later, partially coherent channels became an important research topic in the context of phase jitter induced by 
phase demodulation Good modulation schemes for such channels were presented in ||5|. To this date, little is 
known about the capacity-achieving input for partially coherent channels Q. We discuss partially coherent channels 
in Section HVl and derive an effect we call "spectral loss". At the end of that section, we use the capacity of fiber- 
optic communication channels as one application of our results. Finally, in Appendix |A] we review results from 
directional statistics that are useful for understanding phase noise and other circular random processes. 

II. A Polar Decomposition of Mutual Information 
Consider a channel with complex-valued input 

X = X„-eJ^<, X„ e [0,cx)),X<j e [-7r,7r) (1) 

and output 

r = •e-'^<, i;, e [o,cx3),r< e [-TT,^), (2) 

where the notation X„,Y„ (amplitudes) and X^,Y^ (phase angles) reminds us of what parts of the signal these 
variables refer to. (We use lower-case fonts x„ to denote a reahzation and calligraphic fonts X„ to denote the support 
of the random variable X„.) 
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The mutual information I{X; Y) between this channel's input and output can be expanded by repeatedly applying 
the chain rule of mutual information Q p. 22] as 

I{X;Y)^I{X„,X^;X„Y^) 

^ I{X,r,X„Y^) + IiX^;X„Y^\X,,) 

= nX^ + ; y< |X„) + /(X„ ; Y^\Y,\ + YjX„, y<) . (3) 

Amplitude term Phase term Mixed term I Mixed term II 

The expansion (O can be interpreted as decomposing the complex-valued channel with two degrees of freedom 
(amplitude and phase) into two sub-channels with one degree of freedom each. 

The first sub-channel, represented by the amplitude term of the mutual information 

/(X„;i;,)=/ pM f p(ynk„)log^%^dy„da;„ (4) 

conveys only the amplitude of the signal and is unaffected by impairments such as phase noise. 
The second sub-channel is characterized by the phase term of the mutual information 

/(X<;y<|X„)- / p{x,)I{X^-Y^\x,)dx, 
Jx„ 

P{xu) II p{x^,y^\x„)\og— ^ " — -da;<dy<da:„ 

x„ JJx^,y^ P(x<\x„),p(y<i\x„) 

p{x„) / p(a;<t|x,|) / p(2/<|x<(,a;„)log " d;/<tdx< dx,, 

x„ Jx^ Jy^ P[y<\xu) 

v ' 

= SxAliX^;Y^\X„^x,)}, (5) 

where £x {f{X~x)} denotes the expectation of f{X) with respect to the random variable X that takes on the 
values X. Eq. (|5]) can be paraphrased in words as the information that can be obtained about the input phase by 
observing the output phase, given that the input amplitude is already known. This term is significantly affected by 
phase noise, but agnostic to amplitude distortions such as clipping as long as the input amplitude is known. 

After separating the complex-valued channel into an amplitude and a phase part, the two mixed terms (I and 
II) in (O yield the "cross information" between these two sub-channels. Mixed term I represents the amount of 
information about the input amplitude that can be drawn from the output phase in addition to what has already been 
learnt about the input amplitude by observing the output amplitude. Finally, mixed term II yields the information 
about the input phase that can be obtained from observation of the output amplitude given the input amplitude and 
the output phase. 

The polar decomposition of mutual information can be helpful in understanding the characteristics of the 
channel input, e.g., concerning symbol constellations, and transmission impairments. Moreover, the decomposition 
significantly simplifies the computation of the mutual information in cases where the mixed terms are zero or 
negligibly small. The computation of I{X\ Y) then reduces to evaluating the conditional probability densities in (|4|i 
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and (|5]l, which are often known. Even if the mixed terms do not vanish, the two main terms yield a lower bound 
on the mutual information (and can hence be used to get a lower bound on capacity). 

III. Decomposition of the AWGN Channel 
We next apply the decomposition ((Sj to the complex-valued AWGN channel 

Y = X + N, iV ^ A/t (0, 2al) , (6) 

with the power constraint £ < Pg. This channel's signal-to-noise ratio (SNR), stated here for later reference, 

is 

— (7) 
20-2 ^ ' 

n 

or (in dB) 



lO-log.o^^J. (8) 
A. Gaussian Input 

1) Amplitude term: The first term in the decomposition is I{X„;Y„) — h{Y„) — h{Y„\X„). The capacity of 
the AWGN channel ^ with average power constraint is maximized hy X ^ JVc (0,Ps) 13 P- 242]. Since N ^ 



JVc (0,2o-2), the channel output is Gaussian distributed, F - A^c (0,Ps + 2al). Then, X, = ^/WXF+W^ 
follows a Rayleigh distribution with parameter (Pg + 2a^)/2 [8, p. 45]: 

PiV") = 77%T • exp f-TT^) • (9) 



P. + 2^2 ^s + 2a2 

The differential entropy of the output amplitude in bits is Q p. 487] 

MK,) = ilog,(P. + 2.^ + ^ + ^-1, (10) 

where 7 w 0.577 is the Euler constant. 

Calculating h{Y„\X„) requires knowledge of p{y„\x„), which is a Ricean distribution [[S] p. 47]: 

y ( ^ ~^ \ f y \ 

p(,„|.„) = ^ . exp [-^^) • I" j ' (1 1) 

where Io(.) is the modified Bessel function of the first kind with order zero. It can be seen that for x„ =0, the Ricean 
distribution turns into a Rayleigh distribution; for Ps = 0, (HJ and (fTTT i are equal. Using the general form (fTTI) of the 
conditional PDF, the integration required to calculate h{Y,]Xi) is intractable. A significant simplification is obtained 
when the channel's SNR (|7]l is large. In this limit of large arguments of the Bessel function {Xuy„/(j1^ ^ 1), we 
can use Io(z) /^/2ttz |j9] p. 377]. The Ricean PDF ( fTTT i then turns into the Gaussian PDF 

p(,„|x„) « ^= . exp L^Jtl^^) . (12) 
cr„V2^ V 20-2 J 



In deriving (fT2] ). we dropped a factor \Jy»lx,, which decays to 1 asymptotically with increasing SNR. With (fT2l i. 
the conditional differential entropy can be calculated as 

/i(X,|X„)« ilog2(27re(T2). (13) 
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Finally, using ( fT3] l and ( fTOl l. an asymptotic approximation for the amplitude term is 

liX,r,X,) ^ h{X,) ~ h{X,\X,,) 

1 / Ps\ I 1 + 7 



1 , Ps 1 , ^1 + 7 



l0g2 - 7T log2 ^ + - 1. » 2<. (14) 



!^-0.69 

2) Phase term: The phase term /(X<; y<t|X||) = /i(Y<|X||) — /i(y<t|X<, X,,) is calculated similarly. For any 
input amplitude x„, the output phase is uniformly distributed in [—tt,tt), so the first conditional entropy is easily 
found to be 

/i(Y<|X„) = -/ / p(a;„,y<t)log2p(y<|x„)d?/<dx„ 

P{x») / p(y<|a;„)log2P(2/<|a;„)dy<da;„ 



-log2(27r) 

log2(27r). (15) 



Similarly, we can write 



/i(Y<|X<t,X„) = - / / p(x„,a;<t)-/ p(y<(|a;„, a;<) log2p(2/<|x„, x<)dy< da;i,dx<. (16) 



— h{Y<i ,a;<) 

The conditional entropy h{Y^\x», x^) is not affected by the constant phase shift a;<, so that we can assume x^ =0 
without loss of generality and write the conditional phase PDF as IS), ifTOl . ifTTl 

1 f 

2^;^ H ; 1^ ^2^ y 

The PDF ([TtI i is periodic with a period of 2ti; integrating it over any contiguous 2ti interval yields one. Such 
circular PDFs are reviewed in Appendix |A] 

If the channel SNR is low and we have <C 2cr^, the phase becomes uniformly distributed. On the other hand, 
when xf^ ^ 2cr,'^, (fTTI i can be approximated by the Gaussian PDF ijS] p. 273] 

p{y<\xn,x^^Q) K exp ( - ) . (18) 

With this approximation, the inner entropy integral in (fTST i can be approximated as 

Mi^<|a;,„x<) « J •log2 f2^e- %V » 2a^, (19) 
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and the entropy ( fT6l ) becomes 

/i(y<|X<(,X|,) = / p(x<) • / p{x„\x<i) ■ h{Y^\x„,x<i)dx„dx< 

J J X:: 



oo 



p(x<t)da;<( • / p{x,) ■ h{Y^\x„,x^)dx„ 





Xi, / xl\ 1 







exp — 77 I • 77 log2 27re • ^ ) da;. 



P,/2 \ PsJ 2 V 2;i 



The separation of the integrals (second equality) is possible because h{Y^\ independent of x^. In the 

same line, we used p{x»\x<i)—p{xt^, which is a Rayleigh distribution. 

Finally, the decomposition phase term can be approximated from (fTsl l and ( l20l l: 

1 ^ li 1 + 7 1 



-^^°S^2l + ^^°S^--^ + ^' ^^^^ 



3) Mixed terms: For the AWGN channel with Gaussian input, mixed term I in the decomposition is always zero. 
To prove this, observe that p{y<i) —p{y<i\x„) —p{y<i\x„, y„) — 1/{2tt) within any 2tt interval and zero outside. Then, 
we obtain the conditional entropies 

HY^\Yn)=- / p{x„) / p(2/<t|x,|)log2p(y<|a;„)dy<da;,i = / ?3(a;,i)d2:„ • log2(27r) 

Jo J -TT Jo 



and 



-log2(27r) 

= log2(27r) (22) 
h{Y^\Y„, X„) ^ - 1 1 p(x„,?/„) / p{y^\xn,y,)\og2p{y<\x„,y,)dy^AxAyn 

J Jo J-TT 



ii: 



-log2(27r) 

) 

p{x„,y,,)dx,,dy„ ■ log2(27r) = log2(27r), (23) 



and so 



I{X„;Y^\Y„) = HY^IX,) - h{Y^\X„X,,) = 0. (24) 



Mixed term II, I{X^;Y„\X„,Y^), reaches its (numerically calculated) maximum value of approximately 0.08 
bits/symbol at 101ogjo(^s/(2cr^)) = 1 dB and tends to zero for large SNRs. 

The results of the decomposition for the AWGN channel with Gaussian input are shown in Fig. [T] The depicted 
curves were obtained from numerical integration of the mutual information integrals; markers indicate the analytical 
approximations ( fT4] i and ( 1211 1. Observe that the amplitude and phase terms are the main contributors to the 
channel capacity, whereas mixed term II (shown in the inset) is negligibly small. It can be seen that the analytical 
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SNR in dB 

Fig. 1. Mutual information decomposition tenns as a function of SNR in dB (s) for tlie AWGN channel with Gaussian input. Lines show 
numerical results, markers conespond to analytical approximations )14t and )2U . The inset shows the magnified curve of mixed term II. 

approximations are accurate at SNRs of approximately 15 dB and higher At high SNRs, both mixed terms are 
(exactly or near) zero and the amplitude and phase terms add up to the full capacity, as expected from (fT4l) and 
(EB. 

It is noteworthy that the complex Gaussian input, which maximizes I{X;Y), does not maximize the single 
decomposition terms independently. The amplitude term I{X„; Y„), for instance, is maximized by a "half-Gaussian" 
rather than a Rayleigh distribution at large SNRs, see Section IIV-BI 

B. Phase-modulated Input 

The terms constant-intensity, constant-envelope or ring modulation are used in the literature to characterize the 
input of a system which encodes information only in the phase of the transmitted signal. Results on the capacity 
of constant-intensity channels in the presence of AWGN have been reported over a period of 50 years, e.g., O, 
irm - lfT4ll . The capacity of a channel constrained to constant intensity ("continuous PSK") is an upper limit on the 
rates achievable with discrete PSK constellations. 

An important detail in the definition of phase-modulated AWGN channels is whether the receiver has access 
to amplitude and phase of the received signal or to the phase only. Although it has been observed 111] that both 
capacities are equal in the limit of large SNRs, evaluating the capacity difference at lower SNR values has remained 
an open problem. 

Performing a polar decomposition (O of the phase-modulated AWGN channel is the key to shed light on this 
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question. As no information is encoded in X,, = = const., the amplitude term and the mixed term I of the 
decomposition equal zero. 

As expected for a phase-modulated system, the phase term conveys the greatest share of the transmitted informa- 
tion. In the absence of amplitude modulation, this term can be written as I{X^; Y^\X„ = ^/P^) = h{Y^)~h{Y^\X^). 
The capacity-achieving input distribution is uniform in [— 7r,7r) lfT2l . Hence, y< is uniformly distributed, too, and 
h{Y^) — log2(27r). To calculate h{Y^\X^), the entropy integral has to be solved for the conditional phase PDF 
([TtI i. An asymptotic approximation can be found for large SNRs, where dTTI l can be replaced by its Gaussian 
approximation ( fTSl l. The conditional differential entropy h{Y^\X^) then approaches iT% . and the decomposition 
phase term becomes 

I{X^;Y^\X, = /f^) = hiY^) - /^(r<|X<) 

«log2(27r)-i.log2 (2^6-^) 

« I ■ log2 ^ + 1.1 bits, » 2al (25) 

Hence, the capacity of the phase-modulated AWGN channel is approximately 1 . 1 bits/symbol larger than half that 
of the AWGN channel with Gaussian input for large SNRs. 

Finally, mixed term II, I{X^; Y„\X„ = y/^, Y^), represents the (small) amount of information that can be gained 
by receiving the signal amplitude and phase rather than the phase only. Fig. |2] shows the decomposition terms as a 
function of SNR; the phase term markers indicate the asymptotic approximation (IZST l. which is accurate at SNRs 
greater than 15 dB. 

C. Discrete Input Constellations 

In practical communication systems, the input consists of points from a discrete alphabet rather than of continuous 
values. Performing the polar decomposition for these discrete inputs is useful in two ways: 

> The decomposition can help to adapt constellations to certain channel characteristics. For example, it may be 
beneficial for channels impaired by strong phase noise to re-arrange the points of a constellation in a way that 
the amplitude term is increased at the expense of the phase term. While the overall capacity may be hardly 
affected in the absence of phase noise, an increased capacity is obtained in the presence of phase noise. An 
example for this situation can be found in ifTsl . where 8-PSK is compared to 8-OOK-PSK (7-PSK plus a point 
at the origin) and 8-star-QAM in the presence of fading and phase noise. The decomposition could help to 
accelerate this search for good constellations and possibly to make it more systematic. 
• When determining the mutual information numerically, the computational complexity can be significantly 
reduced by calculating the amplitude and phase terms rather than the full mutual information. This approach 
requires both mixed terms to be negligibly small. 
In the following, decomposition results are given for some exemplary modulation schemes. 
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Fig. 2. Mutual information decomposition terms as a function of SNR in dB (sj for the AWGN channel with constant-intensity (continuous 
ring) input. Lines show numerical results, markers correspond to analytical approximation {25) . 



1) Modulation using one degree of freedom: As examples of modulation schemes where either amplitude or 
phase are modulated. Fig. [3] shows the decomposition of on-off keying (OOK), i.e., X G {0, 1}, and phase-shift 
keying (PSK) with AI = 16 phase levels. 



0.8 



.S 0.6 



0.4 



0.2 



-10 













































I(X-Y) 

I(X„-Y,) 

/(js:<:y<|jf„) 












■ - ■ - I{X^;Y,\X„,Y^) 



10 20 

SNR in dB 



30 



40 




I{X;Y) 

I{X„-Y,) 
/(X<; Y<|X„) 
Ii^X„;Y^\Y,) 

- Y;,|x„,y<) 



10 



10 20 

SNR in dB 



30 



40 



Fig. 3. Polar decomposition of mutual information for OOK (left) and 16-PSK (right). 



As the input phase carries no information with OOK, the phase term and the mixed term II are zero. The 
amplitude term yields the amount of information available when only the signal amplitude is received and processed. 
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An example for such a system is the direct-detection receiver used in optical communication systems, where the 
photodiode responds to the incident light power lfT6l Ch. 4]. Receivers that have access to the full signal (amplitude 
and phase) can extract additional information about the input amplitude from the output phase. This information gain 
is reflected in the mixed term I (dotted line). In the optical communications example, this gain can be obtained by 
upgrading an optical OOK system from direct to coherent detection. At SNRs larger than 10 dB, all the information 
is contained in the received amplitude, so that receiving the signal phase does not yield any additional information. 

For the PSK input, the amplitude term and the mixed term I are zero. A phase-only receiver captures most of the 
available information; the (rather small) gain that is obtained from additional amplitude reception at SNRs below 
10 dB is captured by the mixed term II (dash-dotted line). 

2) Combined ASK-PSK modulation: The simultaneous digital modulation of both amplitude and phase was first 
proposed in 1960 HI. Examples for this type of constellation, which later became known as the Type I or star-QAM 
constellation, are shown in Fig. |4] The constellations depicted in the left column are combinations of 4 amplitude 
levels and 4, 8, and 16 phase levels, respectively. The constellations shown in the right column are modifications 
of these ASK/PSK schemes, where an additional phase offset is introduced between adjacent amplitude levels, thus 
increasing the minimum distance between neighboring constellation points. 

The decomposition results shown in Fig. |5] illustrate the capacity gain obtained from the phase offset in the right- 
column constellations of Fig. |4] As the joint amplitude PDF p{x„, y„) remains unaffected by the phase offset, the 
amplitude term (red line) is equal for both constellations (compare plots on the left and on the right side of Fig. |5]). 
Similarly, the conditional joint phase PDF p{x^,y^\x„) only experiences a constant shift (along the a;<-axis) for 
amplitude levels with phase offset, which does not change the decomposition phase term (blue line). The capacity 
gain achieved by the phase offset is reflected in the increase of the mixed term I (magenta line; cf. top left and top 
right plots); this gain decreases with increasing number of phase levels (top to bottom). 

By letting the number of phase levels go to infinity, the constellation turns into continuous concentric rings 
and the mixed term I tends towards zero. Such modulation schemes with a discrete number of amplitude levels 
and continuous phase angles (so-called ring modulation) were used in an extensive numerical study to estimate 
the capacity of nonlinear fiber-optic channels |17|. As for the constellations discussed above (and most other 
constellations), the mixed term II (green line) is negligibly small but non-zero for ASK/PSK constellations, too. 

3) QAM: The polar decomposition results for M-QAM consteUations with M 4,16,64,256,512, 1024 are 
shown in Fig.|6] It can be seen that the amplitude and phase terms saturate at H{X„) and H{X^\X„), respectively. 
For instance, 16-QAM has three distinct amplitude levels with four or eight distinct phase levels each, so the 
decomposition terms tend towards 

HiX.) . -± log, (±) - A log, _ ^ log, (±) . 1.5 bits (26) 

and 

H{X^\X,,) = i log, 4 + i log, 8 + i log, 4 = 2.5 bits. (27) 
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Among the considered QAM constellations, 4-QAM is a special case in the sense that its mixed term I is zero; 
being a PSK format, its decomposition resembles that of 16-PSK depicted in Fig.|3] For M > 4, QAM constellations 
exhibit a significant mixed term I, so that in the analysis of this modulation scheme, the mutual information may 
not be approximated by the sum of the amplitude and phase terms only. Again, mixed term II is non-zero but 
negligibly small for all QAM constellations. 
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Fig. 5. Polar decomposition of mutual information for ASK/PSK constellations depicted in Fig. |4] without phase offset (left) and with phase 
offset (right). 
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Fig. 6. Polar decomposition of mutual information for J\/-QAM constellations with (from top left to bottom right) M 
4,16,64,256,512,1024. 
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IV. Partially Coherent Channels 

In the preceding section, the transmitted phase (and, of course, the amplitude, too) was corrupted by AWGN. If 
the signal is impaired by phase noise (in addition to AWGN), the channel is only partially able to convey phase 
information even in the absence of AWGN. Such channels are callecO partially coherent I6i. Partially coherent 
channels can be described in continuous-time form by 

y(t)=X(t)-e^®(*)+7V(0, (28) 

where N{t) is a complex-valued AWGN process with variance 2a^ and Q{t) models the phase noise process. We 
can differentiate various types of phase noise appearing in communication systems: 

• The carrier itself as well as the local oscillator used for demodulation have random noise fluctuations. This 
type of phase noise occurs in lightwave communication systems where the laser phase performs a random 
walk (e.g., Brownian motion). The nonzero laser linewidth can broaden the signal spectrum so that spectrally 
sensitive operations (filtering, sampling) require special attention. References on laser phase noise and related 
system aspects include l|20l - ll24l and many references therein. 

• Another type of correlated phase noise emerges when the carrier phase is imperfectly tracked at the receiver 
(e. g., in a phase-locked loop Q, |l6l). In this case, samples from the the phase noise process 8(i) are usually 
assumed to have a von Mises (Tikhonov) distribution ( |52] l. 

• Uncorrelated (white) phase noise can be used to model the nonlinear effect of cross-phase modulation (XPM) 
II25I Ch. 7] in multi-channel fiber-optic communication systems. In this case, the phase noise samples follow 
a wrapped Gaussian distribution ( |49] | as explained in Sec. IIV-DI 

• Signal-dependent phase noise is also found in fiber-optic communication systems, where it is produced by the 
nonUnear effect of self-phase modulation (SPM) ll26l Ch. 5]. SPM induces a phase shift that is proportional 
to the instantaneous power of the propagating optical wave (including signal and noise) ll25l Ch. 4]. 

In general, all types of phase noise are capable of broadening the spectrum of the transmitted signal X{t). This 
spectral broadening is the major obstacle in transforming ( |28] l into a discrete-time channel model. Filtering (and 
sampling) a signal whose spectrum is broadened by phase noise can result (1) in signal distortions and energy 
loss when the filter is narrow II2TI . Il24l and (2) in an increased captured noise power when the filter bandwidth is 
wide (see [24] and references therein). These effects can be neglected when the spectral broadening is moderate, 
which is the case for strongly correlated phase noise processes. Filtering and sampling at the symbol rate is then 
possible and leads to discrete-time channel models that have independent and identically distributed (i. i. d.) signal 
and noise samples, but correlated phase noise samples (see, e.g., [271 ). To obtain a discrete-time channel model 

'The term partially coherent was introduced to communications engineering by A. Viterbi in 1965 (18|. Viterbi possibly adopted the term 
from physical optics, where it characterizes the temporal or spatial correlation of electrical fields that are neither coherent (fully correlated) nor 
incoherent (uncorrelated) 1191 Ch. X]. In communication and information theory, the term noncoherent (rather than incoherent) is used to refer 
to channels that are entirely unable to transmit any phase information. 
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with uncorrelated phase noise samples, the presence of an ideal interleaver and de-interleaver can be assumed (e. g., 
|l6l). It is then possible to transform ( l28T l into the discrete-time form 

Y = X - e^^ + N, (29) 

in which the phase noise time samples <di are modeled as i.i.d. and statistically independent of X. In Sec. lIV-Cl 
we instead discuss partially coherent channels with white phase noise. Discretization of such channels by means of 
filtering and sampling at the symbol rate is possible, but does not lead to (|29] l. Instead, the discrete-time channel 
model must be modified to account for an effect we call spectral loss. 

Before continuing with information rates, we remark that since the phase angle of AWGN is uniformly distributed, 
the order in which phase noise and AWGN act on the transmitted signal is irrelevant: 

Y = {X + N)- e'^ 
= X -e^^ + N -e^^ 

= X-e^^ + N', (30) 

where A^' ^ Afc (O, 2cr,^) has the same distribution as N. 

The circular PDF p{Y^) can be obtained by circular convolution ll28l of (fTTl l with p{Q). In numerical experiments, 
it is usually more efficient to multiply the PDFs' discrete Fourier transforms (DFT) and perform an inverse DFT 
(IDFT) to obtain the final result [29\. In particular, when the phase noise has a wrapped Gaussian distribution, 
the DFT of ([TtI i can be multiplied with the DFT of the "unwrapped" Gaussian (which is again Gaussian). The 
following IDFT will implicitly "wrap" the resulting PDF so that it maintains its periodicity with 2tt. 

In the following discussion of partially coherent channels, the term SNR refers to the power ratio of signal and 
additive noise ([Tjl. 

A. Input Optimization and Information Rate Calculation 

The capacity-achieving input distribution for the partially coherent channel (|29T l is not Gaussian ||30l . but it is 
circularly symmetric ||30l . i.e., uniform in phase, and has discrete amplitude levels |l6l, |[3T1 . In other words, the 
capacity- achieving input distribution for the partially coherent channel consists of a number of continuous rings; the 
number, radii and probabilities of these rings are subject to optimization. Interestingly, the shaping gain that can be 
achieved by using non-equiprobable input symbols rather than a uniform square-area or circular-area distribution 
is significantly larger than the maximum shaping gain of 1.53 dB for the AWGN channel lf30l . Therefore, the 
optimization of signal sets for the partially coherent channel may be more rewarding than for the AWGN channel. 

The polar decomposition is useful for the analysis of partially coherent channels when both mixed terms are 
small or can be neglected. As mentioned in Section |III1 this is the case for AWGN channels with Gaussian or ring 
inputs. As the amplitude term I{X„;Y„) is not affected by phase noise, it suffices to re-calculate the phase term 
in the presence of phase noise. The conditional phase PDF p{y^\x„,x^ — 0) is obtained numerically or, where 
possible, analytically from a circular convolution ||28 | of (fTTl l with the phase noise PDF, usually ( l49l l or (|52] |. 
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Fig. Q shows the decomposition results for the AWGN channel with Gaussian input with additional phase noise. 
The phase noise has a wrapped Gaussian distribution with parameter a as shown in Fig. The phase noise 
parameter values are a = 0,0.5,1,2. For large a, the circular variance goes to one and the wrapped Gaussian 
distribution becomes uniform. In this case, no information can be transmitted in the signal phase and the phase 
term tends to zero. An interesting observation can be made when a is small (but nonzero). In this case, the phase 
term increases with increasing SNR, but tends towards a constant value asymptotically. When the phase term nearly 
reaches this asymptote, the contribution of the phase term to I{X; Y) gets small compared to that of the amplitude 
term (which rises logarithmically with the SNR, cf. (fT4li). This statement is valid for any (arbitrarily low) phase 
noise variance. Fig. |7] shows the amplitude term and the phase terms for ct = 0, 0.5, 1, 2 and the respective total 
capacities. The (very small) contribution of the mixed term II was neglected. 
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Fig. 7. Polar decomposition of mutual information for an AWGN channel with Gaussian input with additional phase noise {a = 0, 0.5, 1, 2). 
The mixed term II is negligible. 



^We remind the reader that denotes the AWGN's variance per dimension, whereas a is the parameter of the wrapped Gaussian distribution. 
Note that this distribution's circular variance is given by )5U : it is not equal to a^. 
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B. Noncoherent Channels 

The noncoherent channel is the Hmiting case of the partially coherent channel ( |29] l when 8 is distributed uniformly 
in [— TT, tt). As the phase is completely randomized, the output phase carries no information and the phase term 
and the mixed term I of the polar decomposition are zero. The fact that the mixed term II is zero, too, is a 
consequence of p(yii|X||, y<) = p{y„\x,y<i). The only information that can be transmitted over the noncoherent 
channel is, therefore, represented by the amplitude term /(X,,;!^,). An example for a noncoherent channel is the 
previously mentioned optical direct-detection (DD) receiver, which can be modeled by Y — \X + N\'^. 

A related but different situation occurs for channels that obey Y = \X\'^ + N . Channels of this kind are found in 
a variety of optical communication scenarios, with different statistics for N . For example, in thermal-noise limited 
DD receivers is a Gaussian process, but other noise statistics can be found for channels limited by (multiplied) 
shot noise or by large amounts of optical background noise, both in fiber and free-space optical communications. 
For a discussion of optical intensity channels with AWGN, see, e.g., lf32l . While the phase term and the mixed 
term II are zero in this case as for the noncoherent and DD channels, the mixed term I can be larger than zero. 
Similarly, when the channel input is constrained to real-valued amplitude modulation, i. e., when the channel model 
\s,Y~X + N, X = [0,00), the mixed term I can be larger than zero. The decomposition of the AWGN channel 
with OOK modulation discussed in Section UlI-CI is an example. 

The conditional PDF p^y^^x,) of the noncoherent channel is Ricean (fTTT i. so the mutual information I{X„\ 1^,) is 
calculated along the lines of the amplitude term calculation in Section IIII-AI The difficulty in finding the capacity 
of the noncoherent channel lies in finding the optimum input distribution p{x^,). Similar to the partially coherent 
channel, it is known that the optimum input distribution p{x) is not Gaussian ll33l . i. e., the optimum p{Xi,) is not a 
Rayleigh distribution (|9]l. Rather, the capacity-achieving input is discrete |l6l. By numerical optimization. Ho found 
an optimum input (for the optical DD channel) that has a discrete probability mass at a;,, =0 and a continuous 
exponential profile at x,, > [34|. At low SNRs, this distribution collapses to two discrete points at x,, =0 and at 
Xh > 0, i.e., OOK, confirming a result reported in ID. 

An analytical approximation to the noncoherent channel's capacity is available in the limit of large SNRs. In 
this case, the Ricean distribution p{y„\x„) can be approximated by a Gaussian, and the capacity-achieving input 
distribution is a positive normal or half-Gaussian distribution 13] 



which is (log2 tt — (1 + 7)/(ln2) + l)/2 ss 0.19 bits higher than the mutual information (fl4l i that results from a 
Rayleigh-distributed input. The same result was found in an analysis of optical DD systems ll35l . Signal shaping 
methods for the optical DD are discussed in ll36l . 




(31) 



In a derivation analogous to that of ( IT4b . the capacity is found to be [3J 




(32) 
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C. Spectral Loss Induced by White Phase Noise 

As discussed above, certain types of phase noise induce spectral broadening. If the phase noise process Q{t) is 
white, i. e., if it is temporally uncorrelated, a related but qualitatively different effect occurs which we call spectral 
loss. 

To describe this effect, we use the continuous-time channel model ( |28] |. We derive the power spectral density 
(PSD) 'Pyif) of Y{t), assuming that X{t) and Np-!<!{t) — e^®^*^ are stationary, ergodic, and statistically independent 
random processes. The autocorrelation function (ACF) ipyiT) of Y(t) is |[37l 

(^y(r) = £ {X{t) ■ e-'^W • X*(< + t) • e-^®(*+^)} +£{N{t) ■N*{t + t)} 

= £ {X{t) ■ X*{t + t)}-£ {eJ^(*) • e-^®(*+")} + ^n{t) 

= <y5X (t) • LpNp^ (t) + (fN (t) , (33) 

where £ {.} denotes the ensemble average. In calculating the ACF V'a'pnI''') of ^PN(i), we assume for simplicity 
that the phase noise follows a wrapped Gaussian distribution (|49] l with parameter a. Since Q{t) and Q{t + t) 
are independent samples of a Gaussian random process, their sum or difference Q'{t) = 9(t) ± 8(t + t) satisfies 
Q'{t) ^ A/r (O, 2(7^) for T 7^ 0. The autocorrelation function fNp^iT) of the phase noise process A^pN(i) is 

1, T = 0, 

(34) 

e-^ , T + 0, 

where the last result (for r 7^ 0) is the resultant length (BOl l of an ergodic (wrapped) Gaussian random variable 0' 
with zero mean and variance 2cr^: 

The piecewise defined ACF ( |34] l can be written as 

^jVp«(t) = e"'"' + lim (1 - e"""') • sine (Br) , (36) 
where sinc(a;)=sin(7r.T)/(7ra;). By the Wiener-Khinchin theorem ll37l . the PSD 'I^Np^^if) of Np-^{t) is 

= e-^^^(/) + ^liiMl-e-^^).i.i-ect(/), (37) 

where 

rect (/)-<( i, 1/1 = 1, ^=^(sinc(t)) (38) 
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is the rectangular function 1281 . Finally, the PSD (I>Y{f) of Y{t) is calculated using ( [33] l and (|37l ) as 

^y(/)- <Z'x(/)*^'7VpJ/)+^'iv(/) 

= e-'^'<^x(/)+ lim ^.^(/)*(l-e-'^')^^^^^ + 'Z'A.(/). (39) 
The ★ sign denotes convolution. Equation ( [39] ) explains the spectral effect of phase noise: The original PSD <?x(/) 

2 2 

is preserved in shape, but attenuated by a factor er" . The remaining signal power (a fraction of 1 — er"^ ) is spread 
over the entire spectrum from — oo to +oo through convolution with a rectangular function whose width goes to 
infinity and whose height tends to zero. Because the fraction of power that leaks outside the original spectrum has 
arbitrarily low power in any given finite band, it does not appear as spectral interference. Hence, we call the effect 
spectral loss (in contrast to spectral broadening). 

A remarkable feature of ( [39] l is its simplicity: the original PSD (I>x{f) is not broadened. We conclude from d39] l 
that filtering (at the bandwidth of the output of the partially coherent channel with white phase noise and 

sampling at the Nyquist rate produces a channel of the form 

Y ^ X ■ e-^-'^' ■ e^'^ + N, (40) 

where T is a random variable and the factor 1/2 appears since (l40l l is expressed in terms of amplitudes. In fact, 
numerical simulations (with large B) show that T approches as i? increases. The resulting discrete-time model 
for our channel is 

Y = X ■ e-i-"^ + N. (41) 

Eq. (4T[ models an AWGN channel whose SNR ^ is attenuated by e^'^^ , so this channel's capacity is 

C = log,(l + ^). (42) 

If the phase noise distribution is not (wrapped) Gaussian, the same calculation will lead to qualitatively similar 
results, with the value of the ACF ( |34] | at r 7^ determining the spectral loss factor. 

We remark that ( [39] l has an important implication for numerical simulations of phase noise. Due to the infinite 
spectral broadening of the power, the output signal Y{t) has infinite bandwidth and is therefore necessarily 
undersampled in numerical simulations with finite bandwidth. Therefore, the numerical simulation of phase noise 
will create aliasing inside and outside the original signal band through convolution of ^x{f) with a rectangular 
function of finite width and nonzero height. To keep this aUasing effect small in numerical simulations, it is necessary 
to oversample X{t) by a sufficiently large factor and to filter the spurious out-of-band noise. 



D. Capacity of Nonlinear Fiber-optic Communication Channels 

Fiber-optic systems are one example for a channel that can be impaired by phase noise. It is therefore tempting 
to apply the channel model with phase noise and spectral loss introduced above to estimate the channel capacity of 
fiber-optic systems in certain cases. Such systems either transport a single channel or carry multiple channels via 
wavelength division multiplexing (WDM). In general, capacity calculations for this channel are very difficult due to 
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the medium nonlinearity, its interaction with Hnear channel effects, and the distributed nature of noise, nonlinearity 
and dispersion. A method for estimating this channel's capacity from exhaustive numerical simulations is proposed 
in ifTTl where results for different physical scenarios are reported. We first emphasize that the curves in ifTTl Fig. 36] 
are for a fixed AWGN variance, except for Curve (2) where the AWGN variance is set to zero. Hence increasing SNR 
at fixed system length refers to increasing the transmit signal power. At low signal power levels, the fiber channel 
is dominated by ASE {Amplified Spontaneous Emission) noise from optical amplifiers and can be characterized as 
an AWGN channel. With increasing signal power, distortions from nonlinear fiber effects increase faster than the 
SNR, bringing the channel capacity down to zero eventually. Cross-phase modulation (XPM) is identified as the 
most relevant effect for the channel capacity of WDM systems IfTTl Fig. 36]. XPM causes a modulation of the 
signal phase in one WDM channel by the instantaneous power levels of all co-propagating channels ll25l Ch. 7]. 
Single-channel systems have a higher capacity because of the absence of such inter-channel nonlinearities iflTl 
Fig. 36, Curves (3) and (4)]. There, the fundamentally limiting effect involves the nonlinear interaction of signal 
and ASE noise. 

Separate results for two special cases give insights into the origin of the capacity limitations fTT, Sec. XI-E]: (1) 
If XPM is suppressed (by transmitting one channel only), then the capacity starts decreasing at a much higher SNR 
than with WDM, see IfTTl Fig. 36, Curves (3) and (4)]. (2) In the "unphysical" case where ASE noise is absent 
(but all WDM channels are present), the capacity is still limited by XPM, see IfTTl Fig. 36, Curve (2)]. 

In the following, we will concentrate on the single-channel case (with optical filtering) which is limited by 
nonlinear signal-noise interaction |17 Fig. 36, Curve (3)]. In contrast to all other cases considered in ifTTl . the 
capacity for the single-channel system setup decreases sharply with SNR. To reproduce this curve with the channel 
model we assume that the phase noise variance in (f4Tl) scales quadratically with Pg (i.e., = c • P^, 
where c is a constant). Stated differently, we assume that the amplitude of the phase shift fluctuations induced by 
SPM scales linearly with the signal power 

Using this model, a rapid capacity loss occurs (see ( f42l i) if the channel suffers from spectral loss. More precisely, 
at high powers Pg (f42l) gives 



The capacity curve ifTTl Fig. 36, Curve (3)] was produced using a 16-ring input. Instead of using ( l42l l. which 
holds for a bidimensional Gaussian input, we calculate the polar decomposition's amplitude and phase terms for the 
channel model ( f4Tl i with a 16-ring input. The (very small) mixed term II is neglected. A good fit of the resulting 
capacity curve with 1 17, Fig. 36, Curve (3)] is obtained for c= 1.1 • 10^ W^^, see Fig. [8] The WDM system capacity 
curve 1 17 Fig. 36, Curve (1)] is shown in red for reference. We observe that the channel model W\\ reproduces 
the sharp capacity decline in the high-power region well. However, the spectral loss model ( f4Tl i with <t^ = c • 
exhibits a sharp capacity roll-off that does not match the shape of the WDM curve shown in Fig. |8| This model 
of spectral loss is clearly insufficient to explain the WDM curve and additional investigations are needed to find 
mechanisms that would reproduce the WDM curve. 
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Fig. 8. Capacities of the fiber-optic channel (16-ring constellation input, single propagating channel) modeled as partially coherent channel 
i4l\ with wrapped Gaussian phase noise distribution with <t^ = c- P^. Dotted lines show numerical results from |17, Fig. 36, Curves (1) and 
(3)] (single channel (black) and WDM system (red)). Upper x-axis shows Pb in dBm, lower x-axis shows SNR in dB (8). 



Finally, we would like to mention that a reviewer of this paper pointed out a discrepancy between the numerical 
results [17 1 and an analytical model with phase noise and spectral loss (such as ( |40] i or (|4TJ) if the noise N is set to 
zero. In this case, the information rate for large but finite Pg will clearly be log2(r) where r is the number of rings. 
The capacity therefore does not reduce with Pg. This is not supported by the results in |17, Fig. 36, Curve (2)] 
and it shows that spectral loss cannot completely account for the capacity reduction at high signal power. This is 
especially apparent if the ASE noise power is small. Thus, as emphasized above, spectral loss should be considered 
as only one mechanism by which fiber capacity can exhibit a maximum and approach zero at high signal powers. 

V. Conclusion 

We have presented a polar decomposition of the mutual information between a complex-valued channel input 
and its output. This decomposition yields two main terms, an amplitude term and a phase term, and two "mixed" 
terms that are small or zero in many cases. The decomposition was performed for the AWGN channel with a 
Gaussian input (for which asymptotic analytical approximations are derived), a phase-modulated input, and with 
discrete input constellations. 

Partially coherent channels are channels with AWGN and additional phase noise. The decomposition amplitude 
term of such channels is not affected by phase noise. In contrast, the decomposition phase term is bounded because 
of phase noise. A property of partially coherent channels with white phase noise that we call spectral loss was 
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derived and discussed. Effectively, this loss decreases the received SNR; hence, the decomposition amplitude term is 
affected by phase noise, too. Spectral loss must be taken into account in the analysis of channels impaired by phase 
noise as well as in their numerical simulation. A particularly interesting example of a partially coherent channel 
is the nonlinear fiber-optic channel. Capacity results for optical channels limited by signal-noise interaction were 
calculated. 

Finally, the polar decomposition is useful to understand the fundamental impairments of channels such as partially 
coherent channels and their optimizing input constellations. The decomposition is a practical tool for a rapid 
numerical evaluation of mutual information in cases where the mixed terms are small and the complex-valued 
channel can be effectively decomposed into two independent one-dimensional channels. 
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Appendix A 
Review of directional statistics 

Random variables such as phase angles or points on a spherical surface cannot be treated with "conventional" 
statistical methods. (E.g., the average wind direction calculated from two measurements of 358 ° and 2 ° is not 
180°.) The field that deals with such directional (in contrast to linear) random variables is known as directional 
statistics II38L 



A. Trigonometric moments 

We restrict our review to one-dimensional directional (or circular) random variables, e. g., phase angles. Such a 
random variable Q is defined on an arbitrary interval of length 27r and has a periodic probability density function 
(PDF) that satisfies 

(44) 



/ p{9)de = 1, c e 



To ensure that the statistical moments of the directional random variable are invariant under a rotation of the 
coordinate system, the trigonometric moments are calculated from e^® rather than from 8. The i"^ trigonometric 
moment of 9 is defined as ll39ll 

/ {e^yp{9)de. (45) 

J — 71 

The first trigonometric moment can be calculated as 

e^^p(6l)d6' = p^-e^^e^ (46) 



where Pq is the resultant length and is the mean direction of Q II38L The i*"^^ central trigonometric moment is 
calculated as the i"^ trigonometric moment of 6 — ^q. 
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To quantify the concentration (or, inversely, the dispersion) of a circular random variable 0, it is common to 
define the circular variance as 0381 . II39I 

V^e° = l-|f {e^®}hl-Pe- (47) 

Clearly, the circular variance is maximized if Q is uniformly distributed (Vq = 1) and minimized for a constant 6 
(Vq — 0). It must be noted that the circular standard deviation is not defined as \/Vq, but as |[39ll 

= ^-2\n{l-V°) = V-21np^. (48) 

B. Circular distributions 

An example for a circular distribution has been introduced above in (ITtT i. which describes the probability density 
of the phase angle of a complex phasor corrupted by complex-valued AWGN. This distribution ranges from a 
uniform distribution (in any 27r interval) for small SNRs to a Gaussian distribution for large SNRs. Middleton gives 
a series expansion of ( fTTI l ifTol § 9.2-2] which has been applied in the context of systems with phase noise (cf. 
references given in ll26l Appendix 4. A]). 

1} Wrapped Gaussian distribution: Another important circular distribution is the wrapped Gaussian distribu- 
tion (iOl, El]: 

^ k— — oo ^ ^ 

This distribution occurs when a linear random variable X ^ A/is (/i, cr^) is "wrapped" around a circle, i. e., Q = X 
mod 2tt. 

The mean direction /Iq, resultant length and circular variance Vq of a wrapped Gaussian random variable 
can be calculated as 114111 

/iQ = A* mod 27r, = e'^-^' (50) 



and 



VS = l-e-i-\ (51) 



The wrapped Gaussian approaches a uniform distribution for large a and can be approximated by a Gaussian 
distribution for small a as shown in Fig. |9] 

2) Von Mises distribution: While the wrapped Gaussian distribution shares some of the properties of the linear 
Gaussian distribution B2l . it does not maximize the entropy for a given (circular) variance. This condition is met 
by the von Mises distribution 1391, EO) 

= eM^cosie ,)) ^ ^^^^ 
27rIo(K) 

where /i is the circular mean (and is usually called the centrality parameter), k is the concentration parameter and 
Io(.) is the modified Bessel function of the first kind with order zero. In engineering, the von Mises distribution is 
known as the Tikhonov distribution (after V. I. Tikhonov) 1431 : it appears in the description of the phase error of 
phase-locked loops Q. 
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The circular variance is calculated using ( |47] i with ( |46] l as 

= 1 



J —7T 




1 


/' 


27rIo(«;) 


J — TT 



= 1 - 



1 



7I"Io(k) Jo 



(008 6* + J sin6')d6' 
loW 



' cos ede = 1 



To obtain ( |53] i, we use the modified Bessel functions of the first kind of order n defined as (see ||9l ) 

1 

TT 



In(K) 



' cos(nx)da;. 



(53) 



(54) 







The differential entropy is calculated as 

M0) = 



-n 27rIo(K) e 
ln(27rIo(K)) 
27rIo(K) 

ln(27rIo(K)) - k ■ 



'd9- 



27rIo(K) 



^K, COS 



cos 9d9 



Io(«:)' 



(55) 



where (|54] | was used twice in the last equality. 

Among all linear distributions that satisfy an average power (or variance) constraint £ < P, the Gaussian 

distribution maximizes the differential entropy h{X) [7|. Similarly, one can ask for the circular distribution p{6) 
that maximizes h{<d) under a circular variance constraint Vq < A. Without loss of generality, we assume /ig = 
which means that £{e^®} is a non-negative real number and that £{sinO} = 0. We can thus write the circular 
variance constraint as 

= 1 - 



p{e)cosede-] I p{e)s\nede 

-TT J — TT 

= 1 -£{cose} < A. 



=0 



(56) 



To prove that the von Mises distribution (|52] l maximizes the differential entropy under the circular variance 
contraint (|56] |, we calculate the KuUback-Leibler distance between the von Mises distribution p{9) and an arbitrary 
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Other distribution q{9): 

D{q\\p)^ r q{9)\n^de 

J-TT P{G) 



p{e) 

q{e)\nq{e)Ae - / q{e)\np{9)de 



-h{q) 

/TT K. COS 9 

= +ln(27rIo(«;)) - K- ( q{9) cos0d9 

J — 7T 

= +ln(27rIo(K)) - k • £9^,(9) {cose} 

>1-A 

<-h{q) + h{p), (57) 

where h{q) denotes the differential entropy h{Q) of a random variable 8 ^ q{9) and where k is chosen to satisfy 
1 — A — Ii(k)/Io(k). Recall that D{q\\p) > with equality if and only if p^q [TJ. Hence, we find that 

hip) > h{q) (58) 

with equality if and only if q= p. 

A different path to get to the same result is to note that the von Mises distribution is a special case of the maximum 
entropy distribution |]7] p. 267]. With the constraint ( |56] l. the maximum entropy distribution with coefficients Ao = 
— ln(27r Io(k)) and Ai = k transforms into ( |52] |. Barakat finds the same result using Lagrange multipliers B4| . 
Observe that the von Mises distribution becomes uniform for large circular variance (small k) and approaches a 
Gaussian distribution with variance — 1/k when the circular variance is small (k large) P0| . Fig. |9] shows the 
wrapped Gaussian PDF for /i = (i.e., /^q = 0) and various values of a. 

Because of its maximum entropy property, the von Mises distribution is often considered to be the circular 
analogue of the linear normal distribution. Hence, it is sometimes referred to as the circular normal distribution; to 
avoid confusion with the wrapped Gaussian distribution, it is advisable not to use this term. The wrapped Gaussian 
and the von Mises distribution have a very similar shape f44l, see Fig. |9] In practice, one often uses whichever is 
more convenient 

3) Truncated Gaussian distribution: Suppose now for the sake of argument that the phase constraint is the usual 
second-order constraint £ jO'^} < A, where the expectation is performed over the interval [— tt. tt). Suppose further 
that we wish to maximize the entropy PDF over all PDFs with f {0} = (the latter constraint is made to simplify 
the discussion). Consider the truncated Gaussian distribution 

p{0) = exp {-^\ , < ^ < TT, (59) 
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where A is a scaling constant that ensures ( |44] | is vaHd, and is chosen so that 5 {6^} = A. We compute 

h(Q) ^ f -p{0)lnp{0)de 

J —TT 



=£-{e2}=A 



1 / 27rcr2 \ A 



We further have 



h(q)- f q{e)\np{0)de 



, 1 f2T:a^\ 1 
-;i(9) + -lnf ^ 



2 V 2(72 

< + (61) 

Using D{q\\p) > with equality if and only if g = p, we find that a truncated Gaussian distribution maximizes 
entropy. 

Fig- m shows the PDFs for the truncated Gaussian distribution for £{9} = and various values of a (wrapped 
and truncated Gaussians) and k = l/cr^ (von Mises). We remark that the physical meaning of our second-order 
constraint is unclear, but the same can be said for the circular variance constraint. It is interesting, however, that 
maximum entropy considerations lead to either a von Mises distribution or a truncated Gaussian distribution. Two 
interesting problems are whether the wrapped Gaussian distribution is maximum-entropy under some natural circular 
constraint, and whether the wrapped Gaussian has other natural "normal" properties ll42l . 
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